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Abstract 



In this paper, we consider the weighted onhne set k-multicover problem. In this problem, 
we have a universe V of elements, a family S of subsets of V with a positive real cost for every 
S G 5, and a "coverage factor" (positive integer) k. A subset {io, ii , . . .} C V of elements are 
presented online in an arbitrary order. When each element ip is presented, we are also told the 
collection of all (at least k) sets Si_^ C S and their costs to which ip belongs and we need to 
' select additional sets from Si^ if necessary such that our collection of selected sets contains at 

, least k sets that contain the element ip . The goal is to minimize the total cost of the selected 

sets^. In this paper, we describe a new randomized algorithm for the online multicover problem 
, based on a randomized version of the winnowing approach of [15]. This algorithm generalizes 

and improves some earlier results in [1,2]. We also discuss lower bounds on competitive ratios 
for deterministic algorithms for general k based on the approaches in [2] . 



K*^ . 1 Introduction 



In this paper, we consider the Weighted Online Set k-multicover problem (abbreviated as WOSCi^) 
defined as follows. We have an universe V = {1 , 2, . . . , n} of elements, a family S of subsets of U 
with a cost (positive real number) Cs for every S € 5, and a "coverage factor" (positive integer) k. 
A subset {io, li , . . .} C V of elements are presented in an arbitrary order. When each element ip is 
presented, we are also told the collection of all (at least k) sets Si^ C 5 in which ip belongs and 
we need to select additional sets from Si^ , if necessary, such that our collection of sets contains at 
least k sets that contain the element ip. The goal is to minimize the total cost of the selected sets. 
The special case of k = 1 will be simply denoted by WOSC (Weighted Onhne Set Cover). The 
unweighted versions of these problems, when the cost any set is one, will be denoted by OSC^ or 

osc. 



*A preliminary version of these results appeared in 9^^ Workshop on Algorithms and Data Structures, F. Dehne, 
A. Lopez-Ortiz and J. R. Sack (editors), LNCS 3608, pp. 110-121, 2005. 
^Supported by NSF grant CCR-0208821. 

* Supported in part by NSF grants DBI-0543365, IIS-0612044 and IIS-0346973. 

^Our algorithm and competitive ratio bounds can be extended to the case when a set can be selected at most a 
prespecified number of times instead of just once; we do not report these extensions for simplicity and also because 
they have no relevance to the biological apphcations that motivated our work. 
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The performance of any online algorithm can be measured by the competitive ratio, i.e., the 
ratio of the total cost of the online algorithm to that of an optimal offline algorithm that knows the 
entire input in advance; for randomized algorithms, we measure the performance by the expected 
competitive ratio, i.e., the ratio of the expected cost of the solution found by our algorithm to the 
optimum cost computed by an adversary that knows the entire input sequence and has no limits 
on computational power, but who is not familiar with our random choices. 

The following notations will be used uniformly throughout the rest of the paper unless otherwise 
stated explicitly: 

• V is the universe of elements; 

• m = max|{S G 5 | i € S}| is the maximum frequency, i.e., the maximum number of sets in 
which any element of V belongs; 

• d = max IS I is the maximum set size; 

ses 

• k. is the coverage factor; 

• e is the base of natural logarithm. 

None of m, d or |V| is known to the online algorithm in advance. 

1.1 Motivations and applications 

One of our main motivation for investigating these problems, especially for large values of the 
"coverage factor" , is their applications to reverse engineering problems in systems biology. However, 
other applications have also been noted in previous literatures and below we mention one such 
application in addition to the biological motivations. 

1.1.1 Client/server protocols [2] 

Such a situation is modeled by the problem WOSC in which there is a network of servers, clients 
arrive one-by-one in arbitrary order, and each client can be served by a subset of the servers based 
on their geographical distance from the client. The extension to WOSCk handles the scenario in 
which a client must be attended to by at least a minimum number of servers for, say, reliability, 
robustness and improved response time. In addition, in our motivation, we want a distributed 
algorithm for the various servers, namely an algorithm in which each server locally decide about 
the requests without communicating with the other servers or knowing their actions (and, thus 
for example, not allowed to maintain a potential function based on a subset of the servers such as 
in [2]). 

1.1.2 Reverse engineering of gene/protein networks [4, 7, 8, 10, 13, 14, 18, 20] 

We briefly explain this motivation here due to lack of space; the reader may consult the references for 
more details. This motivation concerns unraveling (or "reverse engineering") the web of interactions 
among the components of complex protein and genetic regulatory networks by observing global 
changes to derive interactions between individual nodes. In this application our attention is focused 
solely on one such approach, originally described in [13,14], further elaborated upon in [4,18], 
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and reviewed in [10, 20]. Here one assumes that the time evolution of a vector of state variables 
x(t) = (xi (t), . . . ,XTi.{t)) is described by a system of differential equations: 
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where p = (pi, . . . )Pm) is a vector of parameters, such as levels of hormones or of enzymes, whose 
half-lives are long compared to the rate at which the variables evolve and which can be manipu- 
lated but remain constant during any given experiment. The components x^(t] of the state vector 
represent quantities that can be in principle measured, such as levels of activity of selected proteins 
or transcription rates of certain genes. There is a reference value p of p, which represents "wild 
type" (that is, normal) conditions, and a corresponding steady state x of x, such that f(x,p) = 0. 
We are interested in obtaining information about the Jacobian of the vector field f evaluated at 
(x,p), or at least about the signs of the derivatives 9fi/9xj(x,p). For example, if 9ft/9xj > 0, 
this means that xj has a positive (catalytic) effect upon the rate of formation of Xi. To be more 
precise, the goal is to find as much information as possible about an unknown matrix A G R^^"- 
which is the Jacobian matrix 9f/9x. The critical assumption is that, while we may not know the 
form of f, we often do know that certain param,eters pj do not directly affect certain variables X|. 
This amounts to a priori biological knowledge of specificity of enzymes and similar data. Such a 
knowledge can be summarized by a binary matrix C = (cy) G {0,1}^^"^, where "c^j = 1" means 
that pj does not appear in the equation for xt, that is, 9fi/9pj = 0. In our current context, each 
row of C correspond to an clement, each column of C correspond to a set, and 0-1 entries indicate 
the memberships of elements in sets. A crucial contribution of the above-mentioned references in 
this context is as follows. Suppose that we solve this set-multicover instance in which each element 
is covered at least some |3 times. Then with (3 = n — 1 we can recover the elements of A uniquely 
up to a scalar multiple (and, thus can know the signs of the derivatives 9f^/9xj(x, p) precisely) 
and with (3 = n — k for some small k we can recover the elements of A up to a modest ambiguity 
that can be tolerated in practice. If the corresponding experimental protocols are carried out using 
measurements via a suitable biological reporting mechanisms such as fluorescent proteins in an 
online fashion, one arrives at the online set multicover problems discussed in this paper. 



1.2 Summary of prior work 

Offline versions WSCk and SCk of the problems WOSC^ and OSCk, in which all the |V| elements 
are presented at the same time, have been well studied in the literature. Following a brief summary 
of some of the results only about these problems. Assuming NP ^ DTIME(n'°s^°s"^), the SCi 
problem in general cannot be approximated to within a factor of (1 — e) In |V| for any constant 
< e < 1 in polynomial time [11] and cannot be approximated to within a factor of In d— 0(ln In d) 
in polynomial time when restricted to set-cover instances with maximum set size d for all sufficiently 
large d unless P=NP. On the other hand, an instance of the SC^ problem can be (1 -|- Ind)- 
approximated in 0(|V| • \S\ ■ k) time by a simple greedy heuristic that, at every step, selects a 
new set that covers the maximum number of those elements that has not been covered at least 
k times yet [12, 22]; these results were recently improved upon in [7] who provided a randomized 
approximation algorithm with an expected performance ratio of (1 -|- o(l )) In (^) when d/k is at 
least about ~ 7.39, and for smaller values of d/k the expected performance ratio was 

Regarding previous results for the online versions, the authors in [1, 2] considered the WOSC 
problem and provided both deterministic and simple randomized algorithms with a competitive 
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ratio or expected competitive ratio of 0(logmlog |V|) and an almost matching lower bound of 
O ^ log iog^|,^+kfJ k!g |v | ) competitive ratio for any deterministic algorithm for almost all values^ 

of |V| and \S\. The authors of [5] provided an efficient randomized online approximation algorithm 
and a corresponding matching lower bound (for any randomized algorithm) for a different version 
of the online set-cover problem in which one is allowed to pick at most k sets for a given k and the 
goal is to maximize the number of presented elements for which at least one set containing them 
was selected on or before the element was presented. Concurrent to our conference publication, 
Alon, Azar and Gutner [3] considered the weighted online set-cover problem with repetitions which 
is studied in a bigger context of admissions control problem in general networks. Here, an element 
can be presented multiple times and, if the element is presented k times, our goal is to cover it by at 
least k different sets. For this problem [3] contains a randomized 0(log mlog |V|)-competitive algo- 
rithm as well as a deterministic bi-criteria approximation algorithm, i.e., a deterministic algorithm 
that has a competitive ratio of 0(logmlog |V|) and covers an element by at least (1 — £)k different 
sets for any fixed e > 0; it is easy to see that these bounds carry over to the problem WOSCk- Con- 
versely, it is not difficult to see that our algorithm A-Universal and analysis can easily be adapted 
to this problem to achieve an expected competitive ratio of log2 tain d -|- 0(log2 m -|- In d) with ar- 
bitrary set weights; one would need to modify appropriate places of Section 3.4. For unweighted 
sets, via Corollary 2(b), Algorithm A- Universal provides an improved expected competitive ratio 

of "roughly" (neglecting small constants) max|51og2Ta, log2 m In ^ i^i^g^ | and the constants 
involved in this bound are further improved in Theorem 10. 



1.3 Summary of our results and techniques 

Let r(m, d, k) denote the competitive ratio of any online algorithm for WOSC]^ as a function of m, 
d and k. In this paper, we describe a new randomized algorithm for the online multicover problem 
based on a randomized version of the winnowing approach of [15]. Our main contributions are then 
as follows: 

• We first provide a uniform analysis of our algorithm for all cases of the online set multicover 
problems. As a corollary of our analysis, we observe the following. 

- For OSC, WOSC and WOSCk our randomized algorithm has E [r(m, d, k)] equal to 
log2 mln d plus small lower order terms. While the authors in [1, 2] did provide a deter- 
ministic algorithm and a simple randomized algorithm for WOSC with a competitive 
ratio and an expected competitive ratio of 0(log mlog |V|], respectively, the improve- 
ments of our approach and analysis are as follows: 

* We provide better constant factors and lower-order terms. Note that tight analysis 
of the approximability or inapproximability bounds for set-cover type problems in- 
volving tight estimates of the constants and lower-order terms is not a new idea; for 
example, see [6, 7, 17, 19, 21]. 

* We use the maximum set size d rather than the larger universe size |V| in the 
competitive ratio bound. 

* For large coverage factor k (the case of utmost importance in our applications to sys- 
tems biology in Section 1.1.2), our uniform analysis via the quantity k (see Section 3) 

^To be precise, when logj |V| < |iS| < e'^'^ for any fixed 6 > 0; we will refer to similar bounds as "almost all 
values" of these parameters in the sequel. 



4 



provides an expected competitive ratio of roughly 



max < 5 log2 m, log2 m In 



max{l,(H2ilIl)} 



where c > 1 is the ratio of the largest to the smallest weight among the sets in an 
optimal solution. This provides a smooth transition of the expected competitive 
ratio between "roughly" log2 mln d plus small lower order terms for WOSCi^ when 
the weights are arbitrary positive numbers to max|51og2m, log2raln ^ ^^^^^^ ^ | 
for OSCic when all the weights are the same. 
* As a corollary of the above, for (the unweighted version) OSCk for general k the 
expected competitive ratio E[r(m, d, k)] decreases logarithmically with decreasing 
d/k with a value of roughly 51og2m. in the limit'^ for all sufficiently large k. 

We next provide an improved analysis of E [r(m, d, 1)] for OSC with better constants. 

We next provide an improved analysis of E [r(m, d, k]] for OSCi^ with better constants and 
asymptotic limit for large k. The case of large k is important for its application in reverse 
engineering of biological networks as outlined in Section 1.1. More precisely, we show that 
E [r(m, d, k)] is at most + log2 m) • (2 In ^ + 3.4) + 1 +2 log2 ra if k < (2e) • d and at most 
1 + 2 log2 m otherwise. 

Finally, we discuss lower bounds on competitive ratios for deterministic algorithms for OSCk 
and WOSCk for general k using the approaches in [2]. The lower bounds obtained are 

( logilrl^iogt^ ) ^^^-^ ( logiSgglv l) for WOSC. for many values of 

the parameters. 



1.4 Comparison With Previous Work 

The structure of our algorithm is similar to and the analysis method of our algorithm is motivated 
by the implicit randomized algorithm (which was subsequently derandomized) in the paper The 
online set cover problem by Alon et al. [2] . 

For every set we maintain a number that will guide the process of selection; we use ocp [S] , Alon 
et al. use ws. When a new element is received, and it is not covered (or sufficiently covered) yet, 
in both papers this number is multiplied by a constant — if the new element belongs to S (in the 
weighted case, this number is incremented by a constant divided by Cs). The process of set selection 
is a bit different: we simply select set S with probability that equals the increment of ap[S], while 
Alon et al. the procedure is achieving a similar effect rather indirectly — it very much looks like a 
de-randomization of our approach (we knew their approach when we worked on ours, so ours was 
a de-de-randomization). 

The analysis of Alon et al. uses an ingenious potential function, while we use three classes of 
accounts. In either case, this is a form of amortized analysis. The two approaches offer distinct 
advantages. Alon et al. had a much shorter proof and could obtain a de-randomized version. As 
our choices were more explicitly related to Poisson trial, we applied our own versions of Chernoff 
bound to tighten the analysis considerably. 

^Notice the similarity of this dependence of the expected competitive ratio on d/k to that in our results in [7] for 
the offline version of the problem where we provided an approximation algorithm with an expected performance ratio 
of about max{(l + o(1)) In (^) , 1 +2^d7lc}. 
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A fractional solution to the set cover problem is implicit in these solutions, as the "guiding num- 
bers" can be interpreted as fractional choices, and making the "guided choices" can be interpreted 
as rounding. However, neither our analysis, nor that of Alon et al. use that fact explicitly. 

2 A Generic Randomized Winnowing Algorithm 

We first describe a generic randomized winnowing algorithm A-Universal below in Fig. 1. The 
winnowing algorithm has two scaling factors: a multiplicative scaling factor that depends on 
the particular set S containing t and another additive scaling factor |<Si|^^ that depends on the 
number of sets that contain i. These scaling factors quantify the appropriate level of "promotion" 
in the winnowing approach. In the next few sections, we will analyze the above algorithm for the 
various online set-multicover problems. The following notations will be used uniformly throughout 
the analysis: 

• i7 C V be the set of elements received in a run of the algorithm. 

• T* be an optimum solution. 

2.1 A Guided Tour — Rough Sketch of the Analysis of A-Universal for the 
Unweighted Case 

We first sketch the overall analysis of A-Universal for the case when every set has cost 1 to 
provide the reader an intuition behind the overall analysis of the algorithm. Bear in mind that this 
analysis is neither the most precise nor the simplest, but it can be extended to the general case. In 
particular we may overestimate or underestimate the constants slightly in the description to omit 
tedious details in favor of providing a better intuition. 

Since the function Stat always returns 1, we can remove line A4 and simplify line A6 to 
V[S]^m.in[oqp[S] + \Si\-\^). 

The cost of handling an element I by A-Universal is the number of sets that axe selected. The 
analysis is conditional on quantity s = ^(i), where is the sum of ap[S]'s over S G 5i — T* at the 
time when i is received, and we take the worst case over all possible values s. We define event E(b) 
that exactly b sets from Si — T* were already selected before element x was received. Note that 
these selections were successes in Poisson trials that have sum of probabilities s, so the probability 
of E(b) can be expressed as some p(s,b), e.g. using Lemma 13. 

The cost is split into three components: (i) selections of sets from T*, (ii) selections from (Si— T* 
made in lines A8-9, and (iii) selection from <Si — T* made in lines All-12. 

Selections of type (i) are charged to account{T*), obviously the final value of this account 
contributes at most 1 to the competitive ratio. 

Rather than paying for the actual cost of selections of type (ii) and (iii) , we pay the expected cost 
of these selections, and on average we will be paying enough. We estimate this cost as s + deficit, 
and we pay it as follows: we charge a fixed amount 1 +\p to every account[S) such that S G (SiflT*— T, 
and the left-over cost is charged to account[i]. 

The contribution of account{S) to the competitive ratio is the ratio of the expected final value of 
account[S) to the portion of c(T*) attributed to S, and the latter happens to be 1 (in the unweighted 
case!). Thus this contribution is (1 +i|')|3 where |3 is the expected number of times we can charge 
account[S). We introduce function A[S] to estimate (3. The initial value of A(S) = log2 1 = 0. 
When we charge account{S) after receiving element I, the value of £,(S) increases from some x to at 
least x + x + mr\ so rax-|-l increases to at least 2mx-|-2, so A(S) increases by at least 1 — except 
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Fl function Stat(;B,j) 

F2 A^0 

F3 while (|^| < j) do // select j least cost sets from B // 

F4 S <— least cost set from B — A; insert S to ,4 

F5 return cs // return the cost of the last selected set / / 

// definition // 

Dl for (I G V) 

D2 <Si <- {s G 5 : i G S} 

/ / initialization / / 

11 T<— //Tis our collection of selected sets / / 

12 for (S G <S) 

13 ap [S] <— // accumulated probability of each set / / 
/ / after receiving an element i // 

Al deficit <— k — |<Si fl T| // k is the coverage factor / / 

A2 if deficit < //we need deficit more sets for i // 

A3 finish the processing of i 

A4 ^ <- Stat(5i - r, deficit) 

A5 for (S G 5i - T) 

A6 p[S] <- ^ (ap[S] + |5ir^) // probability for this step // 

A7 ap [S] <— ap [S] + p [S] // accumulated probability / / 
A8 with probability min{p[S], 1} 

A9 insert S to T // randomized selection / / 

AlO deficit ^ k - |5i n Tl 

All repeat deficit times // greedy selection // 
A12 insert a least cost set from 5i — T to T 



Figure 1: Algorithm A-Universal 



when £,[S] increases to x + 1 and S is dctcrministically selected, smaller. The average final value of 
A(S) is at most log2m (cf. Lemma 4). Thus account{Sys contribute roughly (1 +\l;)log2Ta to the 
competitive ratio. 

Note that there must be at least deficit many sets in S^nT* — T, so 1 term in 1 + ip surely 
covers the cost of selections of type (iii). If there are b such sets and s > bip, we charge s — bi|) 

to account[i). To find the contribution of account[i) to the competitive ratio we must ascribe part 
of c(T*) to i and to estimate the final value of account{i). If we have received b elements so far, 
c(T*) > kb/d, so we can ascribe k/d to i. 

Note that we make only one charge to account^i). How can we estimate this charge under 
condition E(j — 1]? First, because j — 1 "incorrect" sets were already selected, deficit would be 
if only j — 1 "correct" sets remained unselected, so the charges are unless wc have at least j 
unselected "correct" sets. Thus under condition E(j — 1), if we make any charges at all, at least 
jij) was charged to account[Sys to cover the average cost of selections of type (ii). Thus under 
condition E(b) we charge at most s — (b + 1)i[> to account{i). As we estimate the probability of 
E(b) withp(s,b), we can estimate the average final value of account{i) as ^[f/^^-' p(s, j — 1 )[s — 
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Using Lemma 13, one can show that ij) = max{2, ln(k/d)} assures that account[i) do not con- 
tribute more than a log2 m factor to the competitive ratio. 



3 An Uniform Analysis of Algorithm A-Universal 

In this section, we present a uniform analysis of Algorithm A-Universal that applies to all versions 
of the online set multicover problems, i.e., OSC, OSC]^, WOSC and WOSCi^. Abusing notations 
slightly, define c[S') = Y.sgS' '-s ^'^y subcollection of sets S' C S. Our bound on the competitive 

f c[Si n T*) 1 

ratio will be influenced by the parameter k defined as: k = min < >. It is easy 

ieJ&SeSinT* I Cs J 

r 1 for OSC 

to check that k = s k for OSCi^ . The main result proved in this section is 

[ > 1 for WOSC and WOSC^ 
the following theorem. 

Theorem 1 The expected competitive ratio E [r(m, d, k)] of Algorithm A-Universal is at most 

d 



1 + logi m X max < 5, 2 + In , 

K log2 m 

Corollary 2 

(a) For OSC, WOSC and WOSCic, setting k = 1 we obtain E [r(Tn, d, k]] to be at most log2 mln d 
plus lower order terms. 

(b) For OSCic, setting K = k, we obtain E [r(m, d, k)] to be at most 

1 +log2mxmax{5, (z + ln^^^^} 
~ log2mxmax{5, ln^jj3^} 

(c) Let c > ^ is the ratio of the largest to the smallest weight among the sets in an optimal solution. 
Then, setting k = max|l, we obtain E [r(m, d, k)] to be at most 

l+log2mxmax|5, (^2 + In ^^^^^^^^-4^^) | 

" logimxmaxjs, 1^ (^ ^^, ^^^)| ) } 

In the next few subsections we prove the above theorem. 



3.1 The overall scheme 

We first roughly describe the overall scheme of our analysis. The average cost of a run of A- 
Universal is the sum of average costs that are incurred when elements i G JT" are received. We 
will account for these costs by dividing these costs into three parts costi + Xigj-cost2 + Y.iej'^'^^^l 
where: 

costi < c(T*) upper bounds the total cost incurred by the algorithm for selecting sets in Tfl T*. 
cost2 is the cost of selecting sets from <Si — T* in line A9 for each x G J. 
costj is the cost of selecting sets from <Si — T* in line A12 for each i € ^7. 
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We will use the accounting scheme to count these costs by creating the following three types of 
accounts: 

account[T*); 

account{S) for each set S € T* — T; 
account[i] for each received element i £ J^. 

costi obviously adds at most 1 to the average competitive ratio; we will charge this cost to 
account{T*). The other two kinds of costs, namely cost2+cost3 for each i, will be distributed 
to the remaining two accounts. Let D = -j^^^l— ^- The distribution of charges to these two accounts 
will satisfy the following: 

• ^^^jaccountix)< log2 m • c(T*). This claim in turn will be satisfied by: 

- dividing the optimal cost c(T*) into pieces Ci[T*) for each i G J" such that Hi^j- Ci(T*) < 
c(r*); and 

— showing that, for each i E J, account{i)< log2m • Ci(T*). 

• ^seT* (''Ccount[S)< log2 m • max{4, InD + 1} • c(T*). 

This will obviously prove an expected competitive ratio of at most the maximum of 1 +5(1 +log2 ttl) 
and 1 + (1 + log2 m)(2 + In D), as promised. 

We will perform our analysis from the point of view of each received element i G JT". To define 
and analyze the charges we will define several quantities: 

the value of (x calculated in line A4 after receiving i 
E,(i) the sum of ap[S]'s over S G 5i — T* at the time when i is received 
a(i) IT n 5i — T*| at the time when 1 is received 
A(S] log2(m • ap[S] + 1 ) for each S € S; 

it changes during the execution of A-Universal 

Finally, let A(X) denote the amount of change (increase or decrease) of a quantity X when an 
element i is processed. 

3.2 The role of A(S) 

We will ensure the invariant account[S]< max{4,lnD + l}-A(S)-Cs for every S G T* ■ We will simply 
not accept larger charges to the accounts of sets than this invariant allows. This invariant is useful 
because we will prove a universal upper bound U on the expected final value of A(S), and thus the 
contribution of the accounts of sets to the expected competitive ratio will be max{4, InD + 1} ■ U. 

Definition 3 When we determine the charges to accounts made when element i is received, we 
classify sets from SiClT* — T as heavy if Cs > M.(i) and light otherwise. 

When i is received we charge accounts of S G T* H .Sj. — T in the following manner: 

• for a light set, A{account[S)) = Cs while we can show that A(A(S)) > 1 and 

• for a heavy set A{account{S)) = max{4,lnD + while A(A(S)) > (i.(i.)/cs. 
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The above estimates of A(A(S)) are easy to show: in hnes A6-7 we increment ap[S] + m ^ with 

^(ap[S] + |5irM > ^(ap[S]+m-M, 

Cs Cs 

which increments A(S) = log2(cxp[S] + |<Si|^^ ) — log2 m by at least log2(l + ix(i)/cs); for a hght set 
this increment is at least logil = 1, and for a heavy set we have |a.(i)/cs < 1, and we use the 
following fact: 

log2 ( 1 + x) > X for X < 1 . 

Of course, such an approach makes sense only if we can prove an upper bound on E[A(S)]. 
Note that in step A6 we may calculate a value of p [S] that is larger than 1 . 

We analyze E [A(S)] from the following point of view: consider a fixed sequence of p[S] over the 
execution of the algorithm; each time p[S] > there is a chance that S gets selected and this is the 
last step when A(S) increases. Our bound will hold true for every possible sequence. 

Lemma 4 E [A(S)] < log2 m for m. > 7. 

Proof. We want to find the expected final value of A{S) = log2(m-ap[S]+l ) = log2 m+log2((xp[S]+ 
m~^). It is a function of the sequence of probabilities, say pi,p2,..., that p[S] computed when 
elements of S were received. 

We will be working with sequences formed from possible sequences of probabilities by deleting 
an initial part; let the sum of this initial part and m^^ is z. We define (3pt = z + Y-]z\ V] which 
stands for the value of ap[S] + m^^ in line A6 when we compute pi. We say that p = (pi,P2> • • •) 
is z-legal if for i > 1 we have < pi < Ppi, and if Pi > 1 then pi is the last term of p. Let 
taiUp) = (p2,...)- 

We define F(z,p) as follows. If p is an empty sequence then F(z,p) = 0, otherwise 

F(z,p) =pi log2(pi +z) + (1 -pi)F(z + pi,taU(p)) (*) 

In turn, F(z) is the supremum value of F(z, p] over all z-legal sequences. Our goal is to show that 
F(l/m) <0form>7. 

Wc first show that if the supremum defining F(z) is limited to infinite sequences, then it is finite. 
By repetitively applying formula (*) we get 



oo i-1 ^oo 



e " log2(x + 1 )dx 



where the summation can be converted to an integral as follows: pi can be a sum of dx's over an 
interval of length pi, say from |3pi to |3pi+i , the product can be the probabilistic density function 
that can be bounded from above with e^"" and log2 can be the function that we compute expectation 
of, and it can be estimated from above with log2(x + 1 ); this justifies the estimate with of F(z,p) 
with a convergent integral. 

Next we show that for z > log2e we have F(z] = F(z, (z)) = 1 + log2Z. Suppose that F(z) > 
1 + log2 z. Then for some finite p and for some z > log2 e we have F(z,p) > F(z, (z)) = 1 + log2 z. 
Consider a shortest such sequence. Because of (*) we can conclude that p has length 2, since 
otherwise F(z + pi,tail(p)) < F(z + pi, (z + pi)), but in that case we can replace tail{p) with the 
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single term z + pi . So we can assume that p = (x, z + x) for some x > 0. Then we have 

F(z,p) =xlog2(z + x) + (1 -x)log2(z + X + Z + x) > 1 +log2Z 

which implies 
Xl0g2(z + X) + (1 -X)(1 +log2(z + x)) > 1 +log2Z 

which implies 

X (log2 Z + log2 ^) + (1 - X) (1 + log2 Z + log2 ^) > 1 + log2 Z 

which implies 
log2 

The latter is not possible, because for x > z > log2 e the derivative of the left-hand-side is < 1 , 
while the derivative of the right-hand-side is 1 . 

In a z-legal sequence p we have pi < min{l , z}. As the third observation wc can show that if (3p 
has more than one term, then pi +P2 > min{l,z}, otherwise we increase F(z,p) when we coalesce 
the first two terms of p into one. Let pi = x,p2 = y,pi -|- p2 = p, we have 

xlog2(z + x) + (1 -x)ylog2(z + p) + (l -x)(1 --y)F(z + p) <plog2(z + p) + (l -p)F(z + p) 

which implies 

^ (log2 ^ + log2(z + p)) + (1 - x)y log2(z + p) + xyF(z + p) < p log2(z + p) 

which implies 
^log2 i^f log2(z + p) +'^yF{Z' + p) < 
which implies 

F(Z + p) < log2(z + P) + ^ log2 (l + 5+^) 

Because we always have F(z) < log2(z) + 1 , it suffices to show that ^log2(l + ) > 1. This 

follows from the fact that for x < log2 e the derivative of log2 x is larger than 1 . 

The methods used to show the last two fact allow to characterize the optimal (or worst case) 
sequences: if z > log2 e, use 1-term sequence consisting of z, otherwise start from min{z, 1 , log e— z}. 

As a consequence, if ^ log2 e < z < log2 e then F(z) = F(z, (log2 e — z, log2 e) = log2 log2 e + 1 — 
log2 e + z, and for z < ^ log2 e we know that F(z) = zlog2(2z) -F (1 — z)F(2z). It is easy to see that 
for F(z/2) < F(z), and we can compute the values of F(l/m) for m = 2, 3, . . . , 7: 



m 


1 2 


3 


4 


5 


6 


7 


8 


F(l/m) 


1.086 0.543 


0.397 


0.157 


0.120 


0.067 


-0.016 


-0.112 



□ 

Observe that it is very easy to show the competitive ratio of m, so for m = 1 it makes no sense 
to discuss the competitive ratio, while for 1 < ra < 16, since 41og2ra > m, the upper bound we 
are proving is trivial. 

3.3 Chcirges due to the costs of line A12 

When wc make greedy selections in line A12, there are at least deficit many sets in 5^ H T* — T; 
we can order them according to their costs, say Si, S2, . . .; and let Cs^ = ai. Because we could make 
greedy selections of these sets, the costs of actual selections cannot be larger, so if these costs are 
ordered bi < . . . < b deficit, we have bi < for i = 1 , . . . , deficit. 

Therefore we can charge b^ to account{'i>\) and the expected sum of such charges made to 
each account['i>] is at most 0$ • log2ra. Therefore these charges contribute log2m to the expected 
competitive ratio. 
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3.4 Chcirges due to the costs of line A9 

The expected sum of charges due to the costs of Hue A9 equals ix(t)f,(i) + every set from 

Si — T—T* contributes, regardless of its weight, |j,(i)(ap[S] + ap[S] terms add to while 

\Si\^^ terms add to 1. We will refer to these two terms as A9a charges and A9b charges. 

A9b charges will be given to an arbitrary account of a heavy set (in the worst case, there is 
only one). 

A9a charges are distributed among the accounts of heavy sets and account[i). The idea is the 
following: we will fix the A9a charge to each heavy set account to some ijj such that the contribution 
of these charges to the competitive ratio will be exactly [J.(i)i|j. We estimate the number of the 
heavy sets as follows. 

Lemma 5 There are at least a{i) + 1 heavy sets. 

Proof. Our assumption is that at the time i is received, a(i] sets from 5i— T* are already selected 
to T- Thus when we compute in a call to Stat(5i — T) in line A4 we can form set A from 
Si n T* after excluding a(i) sets with the largest cost. Would we do that, [x(i) would become the 
largest cost in <Si fl T* — T, after excluding a(i) costs that are yet larger, so we indeed have at least 
a(l) sets of cost or more — hence heavy. When we include other sets in A as well, the value 
of |J.(i) can only decrease, and then the number of heavy sets can only increase. □ 

Therefore at most — (a(i) + 1 will be charged to account{i). Thus we need to show 

that E — (a(l) + ^)^\>] is sufficiently small. 

The intuition is that when £,(i] is small, the charges cannot be made, and when £,(1) is large, 
the average value of a(i) is equally large and thus the probability of making charges is sufficiently 
small to assure a very small average value. 

In the next subsection we analyze these probabilities, but it is easy to see that the higher \\), 
the smaller E [£,(i] — (a(i) + ^)^\>]. We want to set the average charge to account{i] in such a way 
that the expected contribution of these accounts to the competitive ratio is at most log2 m. So the 
question is: how large portion of c(T*) can we attribute to element I? 

To simplify our calculations, we rescale the costs of sets so = 1 and thus Cs > 1 for each 
heavy set S and the sum of charges due to line A9 is simply £,(1). 

We associate with i a piece Ci(T*) of the optimum cost c(T*): 



Thus we can charge account[i) in such a way that on average it receives (K/d) log2 m, and let 
= (K/d) log2 m. In the next subsection, we find a suflficiently high value of \p to make it so. For 

now observe that the competitive ratio will be 1 + (3 + log2 m: 1 for the charges to account[T*], 
logi ra for the charges due to line A12, log2 m for the charges to account{i)^s, logi m for A9b charges 
and t|> log2 m for A9a charges. 




se-Sinr* 



It is then easy to verify that 




ieJ 
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3.5 Split of A9a chcirges between i and the heavy sets 

In this section we prove that for i|) = max{2, In D — 1 } we have E [£,(1) — (a(i) + 1 < . 
Define 

' 1 if am < b 



otherwise 



Let ch,aTge(i, i[>, £, x] be the formula for the charge to account{i) assuming we use i[> with ^^\) <x = 
£,(i) < {i + We can estimate ch,arge(i,\p,£,x) in the following manner: 

• If 6[^,^ — ^) = ^, then a[i) + 1 = £, the total charge to all the heavy sets is £ip and thus we 
have to charge account[i) with x — ^^\). 

• if 5(1, € — 2) = 1 then we also have S[x,^ — ^] = 1, so we charged account{i) with x — ^^\) 
already, but we need to charge account[i) with an additional amount of \p. 

• Continuing in a similar manner, it follows that for each b < £ — 2, if b) = 1 we charge 
account[i) with an additional amount of \p. 

Thus we get the following estimate: 

£-2 

E[cHarge(i,i|j,£,x)] =?r[S[^,^-^) = 1] • (x-£\p) +ilJ^Pr = 1]. 

Since T|'(a(i) + 1) < £,(i) and 4" > 2, a(i) + 1 is less than ^^(i). Thus, we can use Lemma 13 with 

X = x = f,(i) and a = j to obtain Pr[£:(i,j) = 1] < e-^^f for j = £- 1,^ -2, . . . ,0. Let C[-\^,i,x) be 
the estimate of of E [ch,arge(i.,i|),£,x)] thus obtained: 

£-1 ) 
C{^\>,^,x)= e-" I j^-^^ {^,c-^^\>)+^\>Y_^ 

Lemma 6 If^\)>2,x>^ and i = [x/ipj > 1 then C(t|),«,x) < 6-^'''+^^ 

Proof. We first consider the case of £ = 1. Because £{i,—^) is not possible, charge(i, 1,x) = 
£[x,0)[x — t|)) and C(i|j, l,x) = e^''(x — i^). Now since ^C(i|>, 1,x) = e^''(— x + i]j + 1), C(i]j, 1,x) 
is maximized for x = \p + 1 with a maximum value of e"^^"*"^ ' . 

For £ > 2 the summation part of the formula for C(i|),£,x) is non-trivial; in that case one can 
calculate that ^ ^ 

^C[^\>, I, x) = e-"-^^^ (-x^ + + 1 )x - -^)^\>). 

As we see, this derivative is a product of a positive function with a trinomial. This trinomial has 
the maximum for x = + 1 )/2, so in our range, (-^\> <x < (£+1 it is decreasing. For x = the 
value of the trinomial is \|j > 0, and for x = t'^ + 2/1 the value of the trinomial is 2 — \|j — 4£^^ < 0. 
Therefore the maximum must occur in the interval between £ip and ii\> + 2/1 and it will suffice to 
prove our claim in this range. 

For X = + z with < z < 2/£ the inequality we want to prove is equivalent to 



LHS = i^^±^z + xl.X < e^^-^^^-^- = RHS (1) 
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Suppose that (1) is true for some then for \p' = i|) + e RHS increases by a factor of e'^ while 
each monomial IM^j^ll^ for j = 0, 1 , . . . , ^ — 1 , increases by a factor of ^1 + ^pl^)^ — + ^) < 

e ^ and thus the entire LHS increases by a factor of at most i|)e ^ < e*^ Because LHS 

increases less that RHS, the inequality for \\) implies that for + £ and thus for every higher value. 
For this reason it suffices to prove the inequality for \|; = 2 and for ^^\) < x < t"^ + 2/t (thus, for 
< z < 2/t). For i[> = 2, our claim is reduces to 

LHS = + 2 1 < e--3« = RHS 

For convenience, let y = 2£ + z. Thus, we need to prove 

£-1 j 

L^^S = TCTlT^y - 2^) + 2_^ ^ < = RHS 
'J- j=o ^• 

subject to 2£ < y < 2£ + f Since £ > 2, y < 2£ + | < 2(£ + 1 ) and thus y - 2£ < 2. Thus 

LHS < 2Z]Zo J, and since, by the well-known series expansion, = Y-j^o ^ suffices to show 
that 

£-1 oo 
j=0 j=0 

for £ > 2, 2£ < y < 2£+ I and = First, we verify by induction that Tj > XJil^o for 1 < j < £. 
Note that for 1 < j < i, Tj/Tj_i = y/j > 2. For the basis case of j = 1, it is therefore obvious. 
Otherwise, Tj > 2Tj_i > Tj_i + Xllo '^^ ~ ^1=0 ^i- inductive hypothesis. Thus, it suffices to show 
that 

oo 

i=o 

For £ + 1 < j < 2£, Tj/Tj_i = y/j > 1. Thus, Xj^o^j — ^"^^ and thus it suffices to show that 
2e^J(, <i-Ji which holds provided i > 2e^ ^ 40.17. Thus, the claim holds for i > 40. 

For 2 < £ < 40 and i|j = 2, we can verify our claim by easy numerical calculation. Notice 
that wc just need to verify C(2, £,Xo) < e^^ where Xq is the real root of the quadratic function 
f (x) = -x^ + 3£x - 2[i^ - 1 ) that lies in the range 2£<x<2£ + 2/£. By numerical calculation, one 
can tabulate the results as shown in Table 1 and verify that C(2, £,xo) < 0.049 < e~^. 

□ 

Now, since i|) = max{2, In D — 1 } > 2 we conclude using Lemma 6 that the average charge to 
account[i) is at most 6"^"^° = 

4 Improved Analysis of Algorithm A-Universal for Unweighted 
Cases 

In this section, we provide improved analysis of the expected competitive ratios of Algorithm A- 
Universal or its minor variation for the unweighted cases of the online set multicover problems. 
These improvements pertain to providing improved constants in the bound for E [T(m, d, k)]. The 
following notations will be used in this section: 
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I 




C(2,£,xo) 


40 


80.049938 


0.000000267802482750 


39 


78.051215 


0.000000367770130466 


38 


76.052559 


0.000000505162811918 


37 


74.053975 


0.000000694037963620 


36 


72.055470 


0.000000953753092710 


35 


70.057050 


0.000001310973313578 


34 


68.058722 


0.000001802442476141 


33 


66.060495 


0.000002478811076980 


32 


64.062378 


0.000003409926108503 


31 


62.064382 


0.000004692144890365 


30 


60.066519 


0.000006458452590756 


29 


58.068802 


0.000008892105898008 


28 


56.071247 


0.000012247826675415 


27 


54.073872 


0.000016875076361489 


26 


52.076697 


0.000023258920058581 


25 


50.079746 


0.000032069930688629 


24 


48.083046 


0.000044236337186173 


23 


46.086630 


0.000061043767052413 


22 


44.090537 


0.000084273925651732 


21 


42.094810 


0.000116397546202183 


20 


40.099505 


0.000160843029165595 


19 


38.104686 


0.000222370693445282 


18 


36.110434 


0.000307594429791974 


17 


34.116844 


0.000425709065373619 


16 


32.124038 


0.000589504628397967 


15 


30.132169 


0.000816780125566277 


14 


28.141428 


0.001132311971151022 


13 


26.152067 


0.001570588251431389 


12 


24.164414 


0.002179590204991318 


11 


22.178908 


0.003025980931596380 


10 


20.196152 


0.004202124182703906 


9 


18.216991 


0.005835328094363729 


8 


16.242641 


0.008099376451161879 


7 


14.274917 


0.011227174827357965 


6 


12.316625 


0.015519482245119539 


5 


10.372281 


0.021333034990024608 


4 


8.449490 


0.028995023101223379 


3 


6.561553 


0.038468799615120751 


2 


4.732051 


0.048129928161242959 



Table 1: Verification of C(2,€,xo) < e-^ for 2 < € < 40. 
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T is the set of elements of T for which line A5 was executed. 



4.1 Improved performance bounds for OSC 

^ Tt r J ^ logimlnd, i/m>15 
Theorem 7 E r m, d, 1 < /, , \ , 

(2 + log2m) (1 +lnd), otherwise 

In the rest of the section, we prove the above theorem via a series of claims. Note that for OSC we 
substitute (x = Cs = k = 1 in the psuedocode of Algorithm A-Universal and that deficit G {0, 1 }. 



Lemma 8 For any T G T*, E 



ITI 



otherwise 



log2 m, 



Proof. We can use the proof of Lemma 4 with small exceptions. The sequence of probabilities 

that arc computed arc always doubling the previous one, so for z > 1 we always use probability 1 
and as the result, F(z) = log2Z+ 1, and thus E(1) = 1. Similarly, for ^ < z < 1 we have F(z) = 



z(log2Z+l} + (l— z)(log2Z+2) = log2Z+2—z, and thus F(z) = i. In turn, E 



ITI 



log2m+F(l/m) 



so for m > 2 we have E 



|T| 



< log2 m + 2 and for m > 7 we have E |T| < log2 m, 



Obviously E [|T|] is equal to the sum of probabilities used in line A12 plus the number of times 
we execute line A12. Let f,(i) be the value of aap[i] at the time the algorithm receives element i as 
the input. If the test of line A2 is false, the sum of probabilities used in line A6 is £,(i) + 1 , while by 
Lemma 13 with a = line A12 is executed with probability at most ^ < 0.37, so the contribution 
of i to the expected cost is smaller than f,(i) + 1 .37. 



Lemma 9 Fori ^T*, if |T| > then E ^.^^ £,(i 



< E 



|T|] ( 



In ITI -InE 



Proof. Before the condition in line A2 is evaluated for element i the algorithm performs in- 
dependent random selections of sets from Si with the sum of probabilities of success equal to 
£,(i). By Lemma 13 with a = the probability that all these selections fail, and thus the test 



in line A2 is false, is Pr 



iGT 



< e 



Let r be a parameter to be established later, and let 



C(i) = max{0, £,(i] - In |T| + V]. Clearly, 



.ieT 



< E 



|T| 



(In |T| - r) + Pr [i G T 



ieT 



at) 



Let T' = {i G T : C(i) > 0}. Then 

^ Pr [i G t] C(i) < Y_ e"^^^^"'°'"^'+^C(i) = ITr^ei^ Y_ e"^^'^C(i) < e^'^ ■ 

ieT iGT' iGT' 

where the last inequality follows from Fact 2 and T' C T. Thus, 



.ieT 



< E 



ITI 



ln|T|-r + 



|T| 
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We can use F = 1 + In E 



|T| 



to get the desired estimate. 



Now, we are ready to finish the proof of the claim on E [r(Ta, d, 1 )] in the theorem. 



E[r(m,d,l)] = ^ < 



i:Ter*E[i:tgT^W+1.37] 



< 



-Ter 



.E[|T|](ln|T|-lnE[|T|]+1.37) 



ITI 



ln|T|-lnE |T| 



+ 1 



.37) 



(by Lemma 9) 



The last quantity is an increasing function of E 



|T| 



, so we can replace it with its overestimate. For 



every m > 2 we can use estimate E 
For m > 1 6 we can use estimate E 



|T| 



< 0.5 + log2 m and the fact that ln(0.5 + log2 2) > 0.37. 



|f I < log2 m and the fact that In log2 1 6 > 1 .37. 



4.2 Improved performance bounds for OSCk 

Note that for OSC^ we substitute jx = Cs = 1 in the psuedocode of Algorithm A-Universal and 
that deficit G {0, 1 , 2, . . . , k}. For improved analysis, we change Algorithm A-Universal slightly, 
namely, line A6 (with (x = Cs = 1 ) 

A6 p [S] <— min { (ap [S] + |»Si|~^ ) , 1 } // probability for this step / / 



is changed to 

A6' p [S] <— min { (ap [S] + deficit • |<Si|^^ ) , 1 } // probability for this step // 

Theorem 10 With the above modification of Algorithm A-Universal, 
E[rfmdk)]<| (2 + log2m)-(21nf +3.4)+1+21og2m ^/k<(2e).d 
' ' \ ^ +21og2Ta otherwise 

Wc now proceed with the proof of the above theorem. As before, T* is an optimal solution and 
for T G T* we define T as the set of elements of T for which line S3 was executed. Since Lemma 8 



is still true with the same proof, we have E 



T < log2 m + 2 for all m. 

We will distribute the average cost of the obtained solution as follows. Each element of T gives 
a charge to T and a charge to its elements. If the algorithm have received the set of element X C U, 
then clearly |T*| > our goal is to give charges to the elements so that their expected sum 

equals xk/d < |T*|. 

We will again perform an analysis of the average cost of receiving an element i for which the 
test in line A2 is false. We define or redefine the following notations: 

o-ap[i] = llse5i-r* (^PtS]; 

£,(i) is the value of CTap[i] when line Al is executed for i; 

|3(i) = |(5,nr*)-(5inr)|; 
il;(i) = |(5inr)-(5inr*)|; 
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The value of deficit in hne Al is at most (3(i] — \[)(i). Element i will belong to some T only if 
T]j(i) < (3(i). We will view £,(i) and (3(i) as fixed parameters of the event when i is received. The 
quantity i]j(i) is the number of successes in independent trials with success probabilities that add 
to ^(i). Letp(i) =Pr[t|;(i) < P(i)]. 

We charge element i with a value of 7te(i) = The intuition is that, because we make 

this charge with probability p(i), on an average it equals p(i)7re(i) = k/d and the sum of these 
charges therefore cannot be larger that \T*\. We then distribute the remaining cost equally among 
^\>[\) < |3(i] many elements of [Si n T*] - (<Si n T). 

Clearly, each of the value of deficit computed in line Al and computed in line AlO cannot 
exceed (3(i). The term deficit • in line A6' adds at most deficit to the sum of probabilities 

computed in line A6', thus the cost attributable to this term, as well as the cost due to line A12 
add to at most 2 per T G T*. It remains to estimate the cost due to^he terms ap[S]. We decrease 
this cost by the charge made to i, so each set T G T* such that i G T receives a charge of at most 



7rs(i) = max {o, ^^^^^] = max \ 0, 



The expected number of sets selected by us is therefore at most 

< iri-Li6T^s(i) + 2-|T|-ir*i + ^ 

< ((2 + log2m)7ts(i)+21og2m+1) -in 

which means we need to estimate the quantity 7ts(i). For this, we first need to calculate a bound 
for p (i) . Remember that i]; (i) is the number of successes of a set of independent trials with success 
probabilities that add up to The standard Chernoff bound theorem [9, 16] states that if we 

have a set of independent trials with the sum of success probabilities |x, the probability that the 
number of successes is below (1 — b)\i. is below In our case, \i = f,(i) and (1 —i>)\i is |3(i). 

We introduce the following notations for simplicity: |3 = |3(i), cj) = f,(i)/P and k = d/k. Now 

|i. = 4)|3 and 6 = (4) — 1 thus via Chernoff bound we have p(i) < e =q 2* Hence 

7rs(i) < max<^ 0, c|) -e 2* p ^ < max 0, cj) -e^2 >^ 

I Kp J I k3 

By using simple calculus and the fact that |3 > 1 , it can be shown that the maximum value of the 
function i[<^) = 4^ — tag' 2"~^'P is at most 21n k + 21n(2e) < 21n k + 3.4. This shows that 



K(3 

7ts(i) < 



21nK + 3.4 ifk<(2e) 
otherwise 



4.3 Lower bounds on competitive ratios for OSC^ and WOSC^ 

Lemma 11^ There exists an instance with m = |>S| sets over n = |V| elements such that for any 
fixed 6 > any deterministic algorithm must have a competitive ratio of 

(i) ^ ( logiog^f+Sog^ ) O^Cy provided klog2 ^ < m < (k+ l)e(^)^ ' anrf k < min{m,n}; 

(") ^ ( logiogTH-Sogn ) for WOSC, provided 

k + log2(n-l - [log2(k+l)]) < m< k + e(^-i-n°S2(k+i)l)^-' and k < ^ ■ min{m, 2^-l }. 

*The relationships between m, n and k were referred to as "for almost all values of the parameters" before. 
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Proof. 

(i) Alon et al. [2] provided an instance of OSC with m' sets and n' elements with such that the 
optimal (offline) cover contains just one set but any online cover must use CI \ogm' ^o^\o§,n' ) 

sets as long as log2n' < ra' < e'^'^^ for any fixed 6 > 0. Consider a given k. We will use one 
additional element x and k additional sets such that x appears in all these sets. To make these k sets 
mutually difi'erent, we will use an additional [log2(k+ 1)] elements (which we will never present) 
and add a distinct subset of these additional elements to each of the k sets. We will also have k 
copies of the instances of Alon et al. [2] with elements renamed to make each copy distinct from the 
rest; each element of each copy is also added to exactly k — 1 of the k additional sets we mentioned 
at first. The total number of elements n satisfies kn' < n = kn' + 1 + [log2k] < (k + l)n', 
and the total number of sets is m = k + km' < (k + 1 )m' since k < m. We first present the 
element x to force the adversary to select the k additional sets; these sets also cover any element 
in the k copies of Alon et al. [2] exactly k — 1 times. After this, we present the elements in the k 
copies of Alon et al. [2] following their scheme, presenting elements in one copy completely before 
presenting elements in the next copy. Now the optimal uses at most 2k sets, whereas by a reasoning 
similar to that in Alon et al. [2] any online algorithm must use O ^k + k • log log^TlT'+ic^^og n' ) 

thus the performance ratio is at least D ( logiL^^^WgnO = ^ ( iogil?f+ioglg^ ) • Moreover, the 
relationship between m and n is given by 



1 



k-log2r^ < k-log2n' < km' < m < (k + 1 )m' < (k+ 1) • e''"''^ " < (k+1) -e^i^J' 

(ii) We again use one additional element x plus [log2(k+ 1)] additional elements (that we will 
never present) to create k additional sets such that x appears in all these sets. We set the cost 
of each of these sets to be arbitrarily close to zero, say e. This time we just use one copy of the 
instance of Alon et al. [2] with each set of cost 1 and, as before, each element of this copy is also 
added to exactly k — 1 of the k additional sets we mentioned at first. The total number of elements 
TL satisfies n' < n = n' + 1 + [log2 k] , and the total number of sets m satisfies m' < m = k + m'. 
We again first present the element x to force the adversary to select the k additional sets; these 
sets also cover any element in the copy of Alon et al. [2] exactly k — 1 times. After this, we present 
the elements in the copy of Alon et al. [2] with n' elements and m' sets following their scheme. 
Overall, the optimal uses sets of total cost 1 + e whereas by a reasoning similar to that in Alon 
et al. [2] any online algorithm must use sets of total cost at least e + O \o^m'^og\ogn' ) ' 
setting e to be sufficiently small we achieve a competitive ratio of 



a 
a 
a 



log m ' log n ' 
log log m ' +log log n ' 

log(m-k)log(n-1-riog2(k+1)1] 
log log(m-k)+log log(n-1 - [log2(lc+1 )] ) ^ 

log m log n '\ 
log log m+log log n J 



where the last equality holds since k < ^ •min{m,2^ ^}. Moreover, the relationship between m and 
n is given by 

k + log2 (n - 1 - riog2(k + 1)1 ) = k+log2 n' < k+m' = m < k+e^'^''^"' = k + e("-i-ri°g2(k+i)l)^- 
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Appendix 

A Some combinatorial and probabilistic facts and results 

Fact 1 If f is a non-negative integer random function, then E [f] = [f > i-] • 

Fact 2 The function f (x) = xe^" is maximized for x = 1 . 

The subsequent lemmas deal with N independent 0-1 random variables Ti, . . . ,Tn called trials 
with event{Ti = 1} is the success of trial number i and s = X.^=^ number of successful trials. 

Let Xi = Pr [Ti = 1] = E [tJ and X = Xi = E [s]. 

Lemma 12 7/0 < 2a < X -M than Pr [s = a] > Pr [s = a- 1]. 
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Proof. Our elementary events are 0/1 vectors t = (ti, . . . ,Tn)- Let be the event {s = a}, i.e. 
the set of elementary events with a I's. Given T € ^a-^ we can form an elementary event from 
by converting some into 1. If we do it with Ti, call the result T^; observe that Pr [t^] > xiPr [t]. 
Therefore the sum of probabilities of elementary events formed from t is at least Pr [t] ^It- ^ =o — 
(X- a + l)Pr [t] > aPr [t]. 

This shows that the sum of probabilities of the multi-set of elementary events formed from 
elements of ^a.-^ is larger than aPr[Ept_i]; in turn, every elements in this multi-set belongs to E^ 
and it is present in this multi-set exactly a times. Thus Pr [EJ > a~^aPr [Ea-i]. □ 

Lemma 13 //O < a < X/2 then Pr [s < a] < e-^XVa!- 

Proof. The case of a = is easy since Pr [s < 0] = FfTLi (1 — Xi) < n[Lie^''t = e^-^. So, we assume 
in the remaining that a > 0. 

We will show how to alter the probabilities so that X remains constant and Pr [s < a] does not 
decrease. Let xq = xi -|-X2, s' = s — ti — T2 and let qa = Pt [s' < a]. We assume that xq < 1. Then 

Pr[s<a] = Pr[Ti =T2 = & s' < a] -FPr[Ti = 1 & s' < a- 1] 
+Pr[Ti =T2 = 1 & s' < a-2] 

= (1 -Xi)(l -Xi)qa+ [(1 -Xi)x2 + Xi(l -X2)]qct_i +XiX2qa-2 
= (1 -Xo + XlX2)qa+(xo-2xiX2)qa-1 +XlX2qa-2 

= [P = (1 - xo)qa + xoqcc-i] + xiX2(qa - 2q„-i + qa-2) 
= P-FxiX2(Pr[s' = a] -Pr[s' = a- 1]) 

If we keep xi -I-X2 fixed, P is constant and we maximize the latter expression when xi = X2 (because 
2a < (X — xi — X2) + 1 , by Lemma 12, the difference of probabilities in the parenthesis is positive). 

This shows that Pr [s = a] is maximized when all x^'s are equal. We can "pad" the vector of Xi,'s 
with zeros, i.e. add trials with zero probability of success. This shows that we can overestimate 
our probability when we go to the limit with N ^ 00 and all Xi's equal to X/N. We can now finish 
the proof by observing the following from standard estimates in probability theory: 

N! / _ X \ fx 
(N -a)!a! V VN 

□ 



22 



